Improving Web Database Access Using Decision Diagrams* 
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Abstract 

In some areas of management and commerce, especially 
in Electronic commerce (E-commerce), that are accelerated 
by advances in Web technologies, it is essential to support 
the decision making process using formal methods. Among 
the problems of E-commerce applications: reducing the 
time of data access so that huge databases can be searched 
quickly; decreasing the cost of database design . . . etc. We 
present the application of Decision Diagrams design using 
Information Theory approach to improve database access 
speeds. We show that such utilization provides systematic 
and visual ways of applying Decision Making methods to 
simplify complex Web engineering problems. 



1 Introduction 

In this paper, we present the application of Deci- 
sion Making methods to solve the problem of optimizing 
database access. At present, developments in Decision 
Making and Logic Design present new opportunities to pro- 
vide database designers with computer-generated represen- 
tations of their problems [|lj |j] . Effective use of these ca- 
pabilities requires managing how information is extracted 
from databases and using visual displays in order to enhance 
human performance in design tasks. Research on data rep- 
resentations is fundamental to the progress in optimization 
of interactive database applications [||]. 

Database access optimizers are the great tools of mod- 
ern Web services to achieve high performance. Such an 
optimizer chooses an optimal strategy for queries process- 
ing from alternative ones. Commercial database systems 
have incorporated access optimizers in the last decade 
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However, new interest in optimal sequence of queries for 
knowledge discovery, on-line interactive services and com- 
plex multi-media objects has caused renewed research in 
optimization. Such database access optimizers have been 
proved inadequate to the needs of these applications [^, H], 
The user interacting with an E-commerce application has 
a number of alternatives of which one must be chosen. The 
objective is to choose the best alternative (product/service) 
as a result of a sequence of decisions [|^]. When a situa- 
tion requires a series of decisions, a decision table approach 
cannot accommodate the multiple layers of decision mak- 
ing. Thus, a graph-based approach is needed. Decision 
Trees (DTs) and its extension Decision Diagrams (DDs) can 
describe these situations and add structure to the problem. 
DDs require less memory for representation than DTs since 
the DD is a reduced DT [[lj, [l(]]. DDs provide an effective 
method of decision making because they: layout clearly the 
problem so that all choices can be viewed, discussed and 
challenged; provide a framework to quantify the values of 
outcomes. 

Most of the tools of modern research in optimization of 
Web database access - not only querying theory but also 
DTs, DDs and other widely used techniques - use the as- 
sumption of maximizing the achievement of some goal un- 
der specified constraints, and presume that all alternatives 
are known [^[j. These tools have proven their usefulness in 
a wide variety of applications. We consider DD representa- 
tion of a Web-linked database using Information Theoretic 
approach to minimize the uncertainty through optimiza- 
tion which becomes a proper heuristic to extract knowledge 
from the Web [||]. Our previous results explore utilization 
of DTs for optimizing interactive network services |J . 

The rest of the paper is organized as follows. In Sec- 
tion ||, we review database notation, introduce basic termi- 
nology, and state the key assumptions of our work. In Sec- 
tion H we describe DDs and information theory concepts, 
show the relation between DDs and database information. 
Then, we describe the algorithms to optimize database ac- 
cess in Section || In Section^, we present case-study results 
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Figure 1. An example of cars database 
on decision making benchmarks and conclude in Section ||. 

2 Database Records and Queries 

The Web E-commerce applications are based on interac- 
tive queries that explore certain products stored in a Web- 
linked database information. Two different principles are 
used when producing a query: (i) generating the queries 
with difficult and sometimes non-trivial questions that take 
a lot of time to answer; (ii) generating simple queries that 
contain questions with possible alternative answers. We 
employ the second principle and propose a new optimiza- 
tion strategy to achieve further performance during the exe- 
cution of queries. Determining the optimal sequence can be 
solved using DDs and information theoretic measures. 

Example 1. To illustrate optimization of Web database ac- 
cess, the following example of Internet Shopping is used. 
We have chosen the demo cars database^ to be used by a 
hypothetical company that sells cars on the Web (Figure [/]). 

Our approach is based on converting a Web-linked 
database to canonical form such as decision table and rep- 
resentation of queries as DD structure (Figure ||). Table [j] 
shows the correspondence of the terminology for the DD 
elements. 

The underlying approach typically involves variables 
(features), x, and response, /. In the following, we consider 
the m-valued logic function f: A™ — > B over the variable set 
X = {x%, ■ ■ ■ ,x n }, where A={0, r — 1} andB={0, m — 1}. 
Here, n is the number of r-valued variables. 



http://www.elshopsoft.com/download/samples/ 



Example 2. The company from Example [7] sells the follow- 
ing car modifications: 

f=0: Ford Toumeo 2.01 - minibus, petrol engine, manual gear, 
velour interior, controllable catalizator, fuel spent 10.9, price 
28,900; 

/=!.' Ford Escort 1.81 - town car, diesel or diesel/turbo engine, 
manual gear, cherry color, velour or leather interior, fuel 
spent 6.4, price 19,900 - 20,300; 

f=2: Mercedes V 230 - minibus, petrol or petrol/turbo engine, au- 
tomatic gear, white color, velour interior, controllable catal- 
izator, fuel spent 11.6, price 36,600; 

f=3: Mercedes 300TD - town car, diesel engine, automatic gear, 
white color, velour interior, fuel spent 8.4, price 27,500; 

f=4: Mitsubishi Pajero 3000 V6 - off-road, desiel/turbo engine, 
manual or automatic gear, white color, leather interior, fuel 
spent 13.7, price 24,800 - 25,600; 

f=5: Mitsubishi L300D - minibus, diesel engine, manual gear, 
metallic color, leather interior, fuel spent 9.8, price 25,700; 

f=6: Nissan Terrano II - off-road, petrol engine, manual gear, 
metallic color, velour or leather interior, controllable catal- 
izator, fuel spent 11.1, price 24,600; 

f=7: Nissan Primera 2.0SLX - town car, diesel or diesel/turbo en- 
gine, automatic gear, velour interior, fuel spent 7.9 - 8.2, 
price 18,350; 

Characteristics of cars described by the multiple -valued 
variables: 

x\ : catalizator - none (0), controllable (1 ); 

X2'. color - black (0), cherry (1), metallic (2), white (3); 

xz: engine - petrol (0), diesel (1 ), petrol/turbo(2), diesel/turbo (3); 

X4,: interior - leather (0), velour (1 ); 

x&: gear - manual (0), automatic (1 ); 



Table 1. Terminology relationship between 
logic and database functions 



Logic Function 


Database 


Variable x 
Function / 
Variable value x = a 
Function value f = b 


Characteristic of the product 
Range of the proposed products 
An alternative 
Product identifier 



xa: fuel spent - less than 8.0 (0), between 8.0 and 10.0 (1), be- 
tween 10.0 and 12.0 (2), greater than 12.0 (3); 

x 7 : price - less than 20,000 (0), between 20,000 and 25,000 (1), 
between 25,000 and 30,000 (2), greater than 30,000 (3); 

xs: purpose - minibus (0), off-road (1 ), town car (2). 

3 Database and Logic Function 

3.1 Database Decomposition 

Let us investigate the decomposition of database infor- 
mation. This can be represented as decomposition of logic 
function / with respect to variable x into uniquely deter- 
mined sub-functions so that it is possible to reconstruct 
/ if the sub-functions are known. For a logic function 

/> fc f\xi=c fi^Xi ■ ■ ■ j Xi— x, C, Xi+if . . . ,X n ) is 

called a cofactor or sub-function of /, when x is fixed to 
c e {0,... ,r}. 

Definition 1. A Decomposition of a function f is defined 
as f = Decomposition(x, fo, ■ ■ ■ , f r ), such that for 
Vx £ X, there exist r uniquely determined cofactors 
fa, ■ ■ ■ , fr-1- 

3.2 Representation of Logic Functions 

Any logic function / can be uniquely determined by a 
truth table on k combinations of variable values. In decision 
making applications, the term decision table is used instead 
of truth table. 

Exampie 3. The decision table for the database from Ex- 
ample [I] is given in Table ^ (k = 19). 

3.2.1 Decision Diagrams and Graph-Based Notations 

Decision Trees (DTs) and Decision Diagrams (DDs) are 
graph-based structures which have become the advanced 
structures in Logic Design and Decision Making for rep- 
resenting and manipulating functions and discrete data 



Table 2. Truth table of logic function / from 
Example Q 



Model 


Xl X2 X3 X4 X$ Xa X7 Xg 
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1 10 10 2 2 

i i \j i kj c c yj 

1 2 1 2 2 
1 3 1 2 2 


n 

o 
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Ford Escort 1 ,8 


1 1 1 2 
13 11112 


1 
1 


Mercedes V230 


1 3 1 2 3 
1 3 1 1 2 3 
1 3 2 1 1 2 3 


2 
2 
2 


Mercedes 300TD 


3 11112 2 


3 


Mitsubushi Pajero 
3000 V6 


3 3 3 1 1 
3 3 1 3 2 1 


4 
4 


Mitsubishi L300D 


2 1 1 2 


5 


Nissan Terrano II 


1 2 2 1 1 
1 2 1 2 1 1 


6 
6 


Nissan Primera 
2.0SLX 


1 1 1 2 
13 1110 2 
2 3 1 1 1 2 
3 3 1 1 1 2 


7 
7 
7 
7 



[Jl], [!(]]. The core of the data structures is a directed acyclic 
graph which forms a canonical representation of a given 
function. 

Definition 2. Decision Diagram is a connected directed 
acyclic graph with vertex ( node ) set and edge set where: 

(i) Each non-terminal vertex is labeled by a variable x and 

assigned as a decision variable. Also, such a vertex 
corresponds to a decomposition step of the function f 
into sub-functions (outgoing edges: edge, . . . , edge r ) 
with respect to the variable x. 

(ii) A terminal vertex is labeled with the leaf value and has 

no successors. But a non-terminal vertex has exactly r 
successors for r -valued variable. 

(iii) When reduction is performed: ( i) any node with iden- 
tical successors DDI, DDr is removed (Fig- 
ure Ufa)); (ii) two nodes with isomorphic DDs are 
merged ( Figure ^b ) ). 

(iV) A DD is called ordered if the variable x appears in 
the same order in each path from the root to a terminal 
vertex. A DD is called free if the order of variables x 
is free along with each path from the root to a termi- 
nal vertex. In other words, the term 'free ' means that 
different variables and expansion types can occur at 
every level ofDD. 




(b) 



Figure 3. Reduction rules for DD design 

Free DDs allow more efficient representation while 
keeping (nearly) all the properties of ordered DDs fl]. We 
deal with free DDs only so the term 'free' will be omitted. 

Example 4. The graph in Figure ^a) represents an or- 
dered DD for the function f (Example |7J). The DD in Fig- 
ure ^b) is free. The effect of reducing the number of DD 
nodes is demonstrated. 

3.2.2 Relation Between Decision Diagrams and 
Database Information 

A DT (or generally DD) is a chronological representation 
of the decision process by a network that utilizes two types 
of nodes: decision nodes, represented by choice nodes (val- 
ues of a function), and chance nodes (variables). Construct- 
ing a DD requires building a logical structure for the prob- 
lem. Here is a sketch description of how to design a DD: 
1. draw the DT using choice nodes to represent decisions, 
and chance nodes to represent uncertainty states; 2. evalu- 
ate the DT to make sure all possible outcomes are included; 
3. reduce DT to DD. 

We can determine the best decision from the graph by 
starting from the root and going forward. From the above 
graph our decision is as follows: 1. ask the consumer sev- 
eral questions to discover his interest; 2a. if answers lead to 
a particular product, then select the product (final decision); 
2b. otherwise repeat questions. 

Definition 3. i-path is a path from the root of DD to a ter- 
minal node assigned with logic value i. x-path is a path 
from the root of DD to a terminal node assigned with no 
value. 

Each i-path defines a sub-set of variable's values that 
uniquely correspond to a record in the initial database [Q]. 

Example 5. Let us consider the function f given by DD 
(Figurefyb)). Its path in bold corresponds to target f = 7 



(7 -path). It means that during the Internet Shopping we will 
follow this path and choose Nissan Primera 2.0SLX. 

The major problem is to choose the variable for DD de- 
sign that will optimize Web database access by minimizing 
levels of DD for quick search and reducing size of DD for 
efficient memory allocation. This problem can be solved 
using information theoretic measures as optimization crite- 
ria. 

3.3 Information Theory and Optimization 

In order to quantify the content of information for a fi- 
nite field of events A = {ai, 02, • • • , a„} with probabilities 
distribution {p(a,)}, i = 1, 2, • • ■ ,n, Shannon introduced 
the concept of entropy Entropy of the finite field A is 
given by (logarithm is base 2) 

n 

H(A) = -J2p{a i )-logp(a i ), (1) 
i=l 

Suppose, there are two finite fields of events A and B 
with probability distribution {p(eij)}, i = 1, 2, • • • , n, and 
{p(bj)}, j = 1,2, • • • ,171, respectively. Let p(a,i,bj) be 
the probability of the joint occurrence of a; and bj. For any 
particular value a,, that A can assume, there is a conditional 
probability p(a,i\bj) that B has a value bj. It is expressed 
by p(a,i\bj) = ^2^ a pfa\-) m The conditional entropy of A 
given B is defined by 

n m 

H(A\B) =-X)5^p(o i ,6 i )-logp(a i |6 i ). (2) 
i=i j=i 

Here, we deal with two finite fields: set of values of func- 
tion / and set of values of variable x. We calculate the 
probability p\f—b = k\f—b/k, where k\f—b is the number 
of assignments of values to variables (patterns) for which 
/ = b and k is the total number of assignments. Other 
probabilities are calculated in the same way. 

Example 6. Consider the function f from Example |IJ The 
probabilities of its output values are p\f=o = P|/=7 — 4 /i9, 
p\ f =i = P|/=4 = P|/=6 = 2 /i9 andp\ S=2 = P|/= 3 = 
P\f=5 = 1 /i9- The entropy of the function is H(f) = 

-2- 4 / 19 -log 2 4 / 19 -3- 2 / 19 -log 2 2 / 19 -3- 1 /i9-log 2 1 /i9 = 
2.64 bit. The conditional entropy of the function f with re- 
spect to variable x\ is H(f\x{) — 9 /ig • 1.24 + 10 /ig ■ 
1.61 = 1.43 bit, and also H{f\x 2 ) = 1.08 bit, H(f\x 3 ) = 
1.00 bit, H(f\x A ) = 1.80 bit, H(f\x 5 ) = 1.53 bit, 
H(f\x 6 ) = 1.01 bit, H(f\x 7 ) = 0.84 bit, H(f\x$) = 
0.99 bit. 

We utilize the presented information theoretic measures 
for optimization of database access. The criterion to choose 




Figure 4. Samples of (a) ordered and (b) free DDs, and resultant DDs produced (c) by greedy 
algorithm Info Gr eed y and (d) by iteration algorithm Infoi ter (second iteration) for the function / from 
Example fj] 



a decomposition variable x for the arbitrary level of DD is 
that the conditional entropy of the function with respect to 
this variable has to be minimal: 

H(f\x)=min(H(f\x i )\Vx i ). (3) 

As a measure of cost, we use the number of levels and the 
number of nodes in the final DD. This choice is motivated 
by the major optimization objectives in Internet Shopping, 
related to reduction of number of queries and overall mem- 
ory size for DD allocation. 

The main reasons for using information theoretic mea- 
sures to optimize data access are: 

1 . The behaviour of entropy function is close to the be- 
haviour of such parameters as the number of nodes and 
the number of levels in DD. The results from [|[| show 
the dependence of the number of nodes in DT expres- 
sion upon the entropy function. 

2. The choice in each particular case is mainly justified by 
the uncertainty of decision making whose estimation is 
closely related to entropy measures. This implies that 
the sequence with less uncertainty (DDs) should be de- 
signed taking into consideration the entropy criterion. 

3. The results of optimization are very sensitive to vari- 
able ordering, e.g. the number of nodes may vary from 
linear to exponential [|l0|]. 

Next, we present a simple example to compare a classical 
method and an entropy-based method of variable ordering. 

4 Algorithms to Optimize Database Access 

Generally, our algorithm to optimize database access 
performs as follows, 



o Initially, a canonical representation, i.e. truth table, 
is generated for the given database information as de- 
scribed in Section |[ 

o Info Greedy (greedy strategy) or Infoi ter (iteration 
strategy) algorithm is applied. The nodes of the DD 
are assigned by variables in accordance with the infor- 
mation theoretic criterion (Equation (Q)). The DT is 
optimized via reduction of the number of nodes. 

o The sequence of queries is formed according to the 
constructed DD. 

4.1 Greedy Strategy - Simple Case 

First, we describe a greedy algorithm to optimize 
database access according to an information criterion. A 
sketch of the algorithm is given in Figure |J 

The basic idea here is that we employ recursion when 
constructing DDs. The ordering restriction is relaxed, i.e., 
(i) each variable appears once on each path and (ii) the or- 
der of variables along each path may be different [Q]. Our 
greedy algorithm for logic functions minimization is: 

Stage 1. At each step of DD design, i.e. attaching a current 
node to the DD, the information theoretic measures for 
decomposition are calculated for each variable. 

Stage 2. The variable x, that corresponds to minimal 
H(f\x), is assigned to the current DD node. 

Stage 3. Sub-DDs for the sub-functions (outgoing edges of 
current DD node) obtained by decomposition with re- 
spect to variable x are recursively constructed. 

Stage 4. Algorithm terminates if the leaves are archived for 
each sub-DD (DD is completed) for the given logic 
function /. 



Input Logic function / 
Output DD ~ Decision Diagram 



Input Logic function /, number of iterations Iter 
Output DD - Decision Diagram 



Info Greedy (/) 

{ if (/ = c, where c = const) then { 
DD «— leaf(c); return; 

} 

fordxi) 

Calculate information measures H(f\xi) 
Choose variable x where: 

H(f\x) = min(H(f\x t ) | Vx,); 
Attach node assigned by variable x to DD 

DD <— node(x) ; 
for(\ff s of decomposition given variable x) 

Recursively construct the sub-DDs DD S : 

DD S = InfOQreedy (fs) } 

return; 

} 

Figure 5. Sketch of the Info algorithm to re- 
alize greedy strategy 



The obtained DD is shown in Figure |](c). The number 
of non-terminal nodes is four and the maximum number of 
levels is three. 

4.2 Iteration Strategy 

We present below an extension of the greedy algo- 
rithm that can be used in practical applications. A con- 
cept of ranging variables xi, . . . ,x n using information the- 
oretic criterion is supposed to improve the characteristics of 
greedy strategy and optimize Web access. 

1. During information measures calculation, we store 
the list of variables x ranging by increasing H(f\x): 
Xjl ,...Xj x Jn , so that H(f\x jt ) < H(f\x H+1 ) 
(Figure 0(b)), in contradiction to lexicographical 
(naive) order (Figure 0(a)). 

2. At each iteration, we choose the variable from the list 
Xj ± , Xj t , Xj n , corresponding to the current itera- 
tion. 

We add the number of iterations as input data for the 
extended algorithm (Figure ||). Such an improvement of the 
basic algorithm does not guarantee the minimal solution, 
but near the minimal one. It is easy to show that algorithm 
Infoiter, with parameter Iter — 1, realizes the greedy 
strategy. We can obtain the results that will be near the 
exact ones by increasing the number of iterations: 
Algorithm Info Greed y < — Infoiter — ► Exact 
Iter = 1 ... 10 ... 100 

Example 7. Let us consider how the algorithm Infoiter 
runs for the function f from Example [/[ At the second it- 
eration, we obtain DD (Figure $(d)). We can conclude that 



for (iter = 1; iter < Iter; iter + +) { 
Inforter(f) {if (/ = c, where c = const) then { 
DD <— leaf(c); return; 

} 

for{\fxi) 

Calculate information measures H(f\xi) 
Range the variables X{ by increasing H(f\xi) 
Choose variable x from list of ranging couples 
Attach node assigned by variable x to DD 

DD <— node(x) ; 
for(Vf 3 of decomposition given variable x) 

Recursively construct the sub-DDs DD S : 
DD S = Infoi ttsr (fs) ; 

return; 

} 

Store minimal DD, according to cost criterion 

} 

Figure 6. Sketch of the Info algorithm to re- 
alize iteration strategy 




"l ... A„ 

Wl x h )<...< H(f\ Xj )< ... < H(f\ x Jn ) 
(a) (b) 

Figure 7. Lexicographical (naive) order and 
the order based on ranging the variables in 
accordance with their information measures 



three requests will be enough to explore all car modifica- 
tions proposed by the company. Firstly, we should generate 
a query that contains a question about customers ' pay abil- 
ities ( xi ), then either gear preferences (x§) or car purpose 
(x%), and fuel spent (xq). 

5 Experiments and Practical Benefits 

In the first series of experiments with algorithms 
Infocreedy and Infoiter for decision making, Machine 
Learning benchmarks were used (Table ^J). In this table, 
N/level/t means the number of DT/DD nodes, the number 
of DT/DD levels and run-time in CPU seconds (Pentium III 
650Mhz, 48Mb). We state Iter = 10 for Infoiter- 

Observation 1. Infoj ter algorithm produces DTs with 
about 10% fewer nodes (DDs with about 7% fewer nodes) 
and about 12% fewer levels ( 9% fewer levels for DDs) than 



Table 3. Results of lnfo Greedy and Info Iter in decision making applications 









InfOGreedy 

DT DD 


InfOlter 

DT DD 




r = m 


k 


N/level/t 


N/level/t 


N/level/t 


N/level/t 


shuttle 


4 


1695 


740/6/8.31 


651/6/10.25 


740/6/8.31 


651/6/10.25 


monks'! te 


4 


432 


10/3/0.26 


10/3/0.26 


10/3/0.26 


10/3/0.26 


monksltr 


4 


124 


17/5/0.05 


15/5/0.19 


13/3/0.24 


11/3/1.84 


monks2te 


4 


432 


10/3/0.26 


10/3/0.26 


10/3/0.26 


10/3/0.26 


monks2tr 


4 


169 


85/6/0.02 


78/6/0.12 


79/6/0.55 


71/6/1.13 


monks3te 


4 


432 


73/5/0.56 


36/4/2.88 


5/3/1 .68 


5/3/1 .68 


monks3tr 


4 


122 


39/5/0.07 


32/5/0.75 


22/5/0.62 


19/5/2.39 


Total 


974/33/9.53 


832/32/14.71 


879/29/1 1 .92 


777/29/17.81 



InfOGreedy does. 

Observation 2. Infoi ter with DDs output gives about 
12% fewer nodes than Infoiter with DTs output. 

In the second series of experiments, we tested the pro- 
posed algorithms for different Internet market examples. 
The optimization results provide more friendly and faster 
user interactions. 

Possible benefits from using DDs are: more compact 
database representations and faster access, better optimiza- 
tion using different criteria (DD size, levels' no.), and flex- 
ibility in developing and updating electronic catalogs. The 
application of Internet shopping is an example where Web 
site customers will be able to buy a product using intuitive 
navigation due to DDs since hierarchical data representation 
is similar to the way decisions are made. 

6 Concluding Remarks 

In this paper, we addressed the problem of optimiz- 
ing database access by using hierarchical organization of 
database information. We have developed computer-aided 
support for the easy construction of DDs and DTs. The 
optimization methods using graph-based structures have 
found wide applicability not only in E-commerce applica- 
tions but also in logic design, computer-aided diagnostics in 
medicine, and other decision making problems [p^|. 

The algorithms produce efficient DDs using information 
theoretic approach. They provide significant improvement 
in the number of queries needed to extract information from 
a database. The experimental results are encouraging and 
the algorithms are easy to construct. 
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